Software-Hardware Cooperative DRAM Bank Partitioning for Chip Multiprocessors
نویسندگان
چکیده
DRAM row buffer conflicts can increase the memory access latency significantly for single-threaded applications. In a chip multiprocessor system, multiple applications competing for DRAM will suffer additional row buffer conflicts due to interthread interference. This paper presents a new hardware and software cooperative DRAM bank partitioning method that combines page coloring and XOR cache mapping to evaluate the benefit potential of reducing interthread interference. Using SPECfp2000 as our benchmarks, our simulation results show that our scheme can boost the performance of the most benchmark combinations tested, with the speedups of up to 13%, 14% and 8.06% observed for two cores (with 16 banks), two cores (with 32 banks) and four cores (with
منابع مشابه
Jigsaw: Scalable Software-Defined Caches (Extended Version)
Shared last-level caches, widely used in chip-multiprocessors (CMPs), face two fundamental limitations. First, the latency and energy of shared caches degrade as the system scales up. Second, when multiple workloads share the CMP, they suffer from interference in shared cache accesses. Unfortunately, prior research addressing one issue either ignores or worsens the other: NUCA techniques reduce...
متن کاملInter-Core Cooperative TLB Prefetchers for Chip Multiprocessors
Translation Lookaside Buffers (TLBs) are commonly employed in modern processor designs and have considerable impact on overall system performance. A number of past works have studied TLB designs to lower access times and miss rates, specifically for uniprocessors. With the growing dominance of chip multiprocessors (CMPs), it is necessary to examine TLB performance in the context of parallel wor...
متن کاملApplication-aware Adaptive DRAM Bank Partitioning in CMP
Main memory is a shared resource among cores in a chip and the speed gap between cores and main memory limits the total system performance. Thus, main memory should be effectively accessed by each core. Exploiting both parallelism and locality of main memory is the key to realize the efficient memory access. The parallelism between memory banks can hide the latency by pipelining memory accesses...
متن کاملJoint Exploration of Hardware Prefetching and Bandwidth Partitioning in Chip Multiprocessors
In this paper, we propose an analytical model-based study to investigate how hardware prefetching and memory bandwidth partitioning impact Chip Multi-Processors (CMP) system performance and how they interact. The model includes a composite prefetching metric that can help determine under which conditions prefetching can improve system performance, a bandwidth partitioning model that takes into ...
متن کاملAdaptive Zone-Aware Multi-bank on Chip last level L2 Cache Partitioning for Chip Multiprocessors
This paper proposes a novel efficient Non-Uniform Cache Architecture (NUCA) scheme for the Last-Level Cache (LLC) to reduce the average on-chip access latency and improve core isolation in Chip Multiprocessors (CMP). The architecture proposed is expected to improve upon the various NUCA schemes proposed so far such as S-NUCA, D-NUCA and SP-NUCA[9][10][5] in terms of average access latency witho...
متن کامل